Turbo-code decoder

ABSTRACT

The present invention provides a turbo-code decoder that adopts the parallel and systolic array VLSI structure design. Since the output of previous level can be used as the input of next level. So the advantages of the parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 units of time, the latency is shorten to as about ⅕ comparing to the conventional sequential calculation structure that takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. However, the VLSI techniques had been progressively improved nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware cost to get the higher speed will be a changeless trend.

BACKGROUND OF THE INVENTION

[0001] 1. Field of Invention

[0002] The present invention generally relates to a decoder, and moreparticularly, to a fast turbo-code decoder. The decoder is designed touse the systolic array very large scaled integrated (VLSI) circuits; theoutput of previous level can be used as the input of next level. Thus,the advantages of the parallel and the pipeline calculation are totallyachieved. The decoding speed has improved manifestly comparing to thecalculation time of the conventional decoder. The speed has about5*(N+M) times faster than the conventional decoder, wherein, N standsfor the block length, and M stands for register size.

[0003] 2. Description of Related Art

[0004] The error control coding is widely used in the communicationsystem and the computer media storage. Berrou, Glavieux andThitimajshima first proposed the turbo-code whose error-correctingcapability nears to the Shannon limited error-correcting in 1993 (C.Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon LimitedError-correcting Coding and Decoding: Turbo-codes (1),” in Proc. ICC'93,May, 1993). Since the excellence of the error-correcting capability, theturbo-code is widely applied in the general communication system such asthe CDMA transmission system. Whereas, if the block length of theconventional decoding algorithm is too small, the error-correctingcapability is not good, wherein the block length is for transmission. Onthe other hand, if the block length of transmission is too large, for acommunication system needs the real time processing, the decoding delayis too large to tolerant. Therefore, it is important to solve thisproblem to fulfill the requirement of the current high-speedcommunication.

SUMMARY OF THE INVENTION

[0005] To solve the problem mentioned above and to increase thecomputing speed and thus to increase the throughput. The presentinvention provides a structure design using the parallel and systolicarray VLSI.

[0006] The structure design adopting the parallel and systolic arrayVLSI mentioned above, wherein the decoder is designed to use thesystolic array VLSI circuits. Since the output of previous level can beused as the input of next level. So the advantages of the parallel andthe pipeline calculation are totally achieved. The latency is only N+M+2units of time, the latency is shorten to as about ⅕ comparing to theconventional sequential calculation structure that takes 5*(N+M) unitsof time. The decoding throughput is about 5*(N+M) times higher than theconventional decoder. Although the quantity of the circuit gate is about5*(N+M) times higher than the conventional decoder. However, the VLSItechniques had been progressively improved nowadays, thus the hardwarecomplexity is easy to overcome. Devoting the hardware cost to get thehigher speed will be a changeless trend.

[0007] In order to achieve the objective mentioned above, the presentinvention uses a parallel and systolic array VLSI structure design toprovide a turbo-code decoder for the communication system. The decodercomprises a serial-to-parallel output unit and a plurality of paralleldecoding units. Wherein, the serial-to-parallel output unit receives aserial input signal, converts it and outputs a parallel signal. Theparallel decoding units mentioned above are serially connected to form aplurality of levels. The first level parallel decoding unit receives theparallel signal that is output from the serial-to-parallel output unit.The output from the first level parallel decoding unit is sent to thesecond level parallel decoding unit, with certain sequence, the parallelsignal passes through the parallel decoding units for decoding process.

[0008] The turbo-code decoder mentioned above, wherein, each paralleldecoding unit receives an extrinsic parameter when processing thedecoding process, to be the signal that is after the decoding processfrom the parallel decoding unit, and sends the extrinsic parameter tothe next level of the parallel decoding unit.

[0009] The turbo-code decoder mentioned above, wherein, the extrinsicparameter is obtained from a deinterleaving operation. The extrinsicparameter of the first level parallel decoding unit is L_(a0,k)=(0, 0 .. . , 0), where k=1, 2, . . . , N, N is the block length of theturbo-code.

[0010] The turbo-code decoder mentioned above, wherein, the serial inputsignals are r_(1s,k), r_(1p,k), and r_(2p,k) messages of the turbo-code,whereas k=1, 2, . . . , N, N is the block length of the turbo-code.

[0011] The turbo-code decoder mentioned above, wherein, theserial-to-parallel output unit receives the r_(1s,k), r_(1p,k), andr_(2p,k), wherein, the subscript K=0, 1, . . . , N+M−1 represents thewhole block and end message. M stands for register size of theturbo-code decoder. The serial-to-parallel output unit coverts thereceived r_(1s,k), r_(1p,k), and r_(2p,k) messages and outputs theresults to the first level parallel decoding unit in parallel. The firstlevel parallel decoding unit also receives an extrinsic parameterL_(a,k) at the same time. The L_(a,k) is the parameter that is obtainedvia a deinterleaving operation on the previous level extrinsic parameterΛ(d_(k)). The initial value of the first level decoding unit extrinsicparameter is set as L_(a0,k)=(0, 0 . . . , 0), a first level extrinsicparameter L_(a1,k) is generated via the first level parallel decodingunit. And makes the message r_(1s,k), r_(1p,k) and r_(2p,k) pass throughsequentially to be the input of next level.

[0012] The turbo-code decoder mentioned above, wherein, the paralleldecoding unit comprises a first decoder, a second decoder, aninterleaving unit, and a deinterleaving unit. Wherein, the first decoderreceives the r_(1s,k), r_(1p,k) messages and the extrinsic parameterL_(a,k). The second decoder receives the r_(2p,k) message and theextrinsic parameter L_(a,k). The interleaving unit is allocated betweenthe first decoder and the second decoder, receives the output of thefirst decoder. The deinterleaving unit is connected to the seconddecoder, alternately outputs the output of the first decoder and thesecond decoder.

[0013] The turbo-code decoder mentioned above, wherein, the firstdecoder of the parallel decoding units constitutes a systolic array VLSIcircuits structure.

[0014] The turbo-code decoder mentioned above, wherein, the systolicarray VLSI circuits is composed of N+M units of the module C, A, B, D,and E. Wherein, the module C receives L_(a1,k), r_(1s,k) and r_(1p,k),and outputs r_(k) ⁽¹⁾(m) and r_(k) ⁽⁰⁾(m). Module A calculates a forwardrecursive probability parameter α_(k). Module B calculates a backwardrecursive probability parameter β_(k). Module D adopts (N+M) units ofparallel calculation to obtain the Λ(d_(k)) after the calculation of theα_(k), β_(k), and γ_(k) ^((i)) are finished. Module E outputs the valueof the calculation from the module D, K=0, 1, . . . , N+M−1.

[0015] The turbo-code decoder mentioned above, wherein, the value of theΛ(d_(k)) is calculated according to a MAP algorithm and followingequation:${\Lambda \left( d_{k} \right)} = {\log \frac{\sum\limits_{m}{\sum\limits_{m^{\prime}}{{\gamma_{k}^{(1)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)} \cdot {\beta_{k}(m)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{{\gamma_{k}^{(0)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)} \cdot {\beta_{k}(m)}}}}}$

[0016] Wherein, α_(k) is the forward recursive probability parameter,β_(k) is the backward recursive probability parameter, γ_(k) ^((i)) is abranch probability parameter.

[0017] The turbo-code decoder mentioned above, wherein, the forwardrecursive probability parameter α_(k) is obtained from the calculationof the previous parameter α_(k−1) and the branch probability parameterγ_(k) ^((i)), the equation is as follows:${\alpha_{k}(m)} = \frac{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{\sum\limits_{i = 0}^{1}{{\gamma_{k}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)}}}}}$

[0018] The turbo-code decoder mentioned above, wherein, the backwardrecursive probability parameter β_(k) is obtained from the calculationof the next parameter β_(k+1) and the branch probability parameter γ_(k)^((i)), the equation is as follows:${\beta_{k}(m)} = \frac{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k + 1}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\beta_{k + 1}\left( m^{\prime} \right)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{\sum\limits_{i = 0}^{1}{{\gamma_{k + 1}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\beta_{k + 1}\left( m^{\prime} \right)}}}}}$

[0019] The turbo-code decoder mentioned above, wherein, the branchprobability parameter γ_(k) ^((i)) is obtained from following equationaccording to the MAP algorithm:

γ_(k) ^((i))(m′,m)=p(γ_(1s,k) |d _(k) =i,s _(k) =m,s _(k−1) =m′)·p(r_(1s,k) |d _(k) =i,s _(k) =m,s _(k−1) =m′)·q(d _(k) =i|s _(k) =m,s_(k−1) =m′)·Pr{s _(k) =m|s _(k−1) =m′}

[0020] Wherein whether the probability parameterq(d_(k)=i|s_(k)=m,s_(k−1)=m′) is 0 or 1 depends on the input bit d_(k)=iis 0 or 1 combines the probability of the state m′ to the state m.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] The accompanying drawings are included to provide a furtherunderstanding of the invention, and are incorporated in and constitute apart of this specification. The drawings illustrate embodiments of theinvention, and together with the description, serve to explain theprinciples of the invention. In the drawings,

[0022]FIG. 1 schematically shows a turbo-code encoder comprising of twoparallel RSC encoders;

[0023]FIG. 2 schematically shows the decoding structure of theturbo-code;

[0024]FIG. 3 schematically shows the structure of the P levels paralleldecoding unit (Level 1, Level 2, . . . , Level P);

[0025]FIG. 4 schematically shows the structure of the first leveldecoding unit of the parallel decoding units in FIG. 3;

[0026]FIG. 5 schematically shows the structure of the systolic arrayVLSI that is composed of the first level decoding unit of the paralleldecoding unit in FIG. 4;

[0027]FIG. 6 schematically shows the structure of the simplifiedmodules, data streams, and the latches of the parallel decoding units inFIG. 3 when N=4 and M=3;

[0028]FIG. 7 schematically shows the calculation structure of the branchprobability parameter γ_(k) ^((i))(m′, m);

[0029]FIG. 8 schematically shows the structure of module A forcalculating α_(k);

[0030]FIG. 9 schematically shows the structure of module B forcalculating β_(k);

[0031]FIG. 10 schematically shows the structure of module D forcalculating Λ(d_(k));

[0032]FIG. 11 schematically shows the structure of the calculationsubmodule L (using analog circuit);

[0033]FIG. 12 schematically shows the structure of the fast RSC encoder,wherein, G_(b)=1011, G_(d)=1010;

[0034]FIG. 13 schematically shows the trellis diagram;

[0035]FIG. 14 schematically shows the detail structure of module A(wherein the submodule L is designed as the digital circuit);

[0036]FIG. 15 schematically shows the detail structure of module D;

[0037]FIG. 16 schematically shows the latency for accomplishing amessage having a block size length; and

[0038]FIG. 17 schematically shows the comparison of the bit error rate,wherein, the iterative decoding number P=6, code ratio R=1/3, registersize M=3, generator parameter G_(b)=1011, G_(d)=1110, the 256*256 randominterleaving method is adopted by the first decoder and the seconddecoder.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0039] The present invention provides a structure design adopting theparallel and systolic array VLSI. The structure design adopting theparallel and systolic array VLSI mentioned above, wherein the decoder isdesigned to use the systolic array VLSI circuits. Since the output ofprevious level can be used as the input of next level. So the advantagesof the parallel and the pipeline calculation are totally achieved. Thelatency is only N+M+2 units of time, the latency is shorten to as about⅕ comparing to the conventional sequential calculation structure thattakes 5*(N+M) units of time. The decoding throughput is about 5*(N+M)times higher than the conventional decoder. Although the quantity of thecircuit gate is about 5*(N+M) times higher than the conventionaldecoder. However, the VLSI techniques had been progressively improvednowadays, thus the hardware complexity is easy to overcome. Devoting thehardware cost to get the higher speed will be a changeless trend.

[0040] Berrou, Glavieux and Thitimajshima first proposed the turbo-codewhose error-correcting capability nears to the Shannon limitederror-correcting in 1993 (C. Berrou, A. Glavieux, and P. Thitimajshima,“Near Shannon Limited Error-correcting Coding and Decoding: Turbo-codes(1),” in Proc. ICC'93, May, 1993). The encoding structure comprises twoparallel recursive systematic convolution encoder (hereafter abbreviatedas RSC). The important characteristics are (1) Two convolution codeswith the same structure encode in parallel, thus the receiving end isable to decode the message repeatedly; (2) To increase the minimumdistance between two encoding codes by using the non-uniform randominterleaving (S. Benedetto and G. Montorsi: “Role of RecursiveConvolutional Codes in Turbo Codes,” Electron. Lett., Vol.31, No.11, pp.858-859, 1995); and (3) Soft-in Soft-out decoding.

[0041] Because the characteristics mentioned above, the capability ofthe error-correcting appears equal and excellent. Due to the excellenceof the error-correcting capability, the turbo-code is widely applied inthe general communication system such as the CDMA transmission system(J. Blaanz, P. Jung, and M. Na B han, “Realistic Simulations of CDMAMobile Radio Systems Using Joint Detection and Coherent Receiver AntennaDiversity,” IEEE third International Symposium on Spread SpectrumTechniques and Applications, Oulu Finland, 1994).

[0042] Referring to FIG. 1, it schematically shows a turbo-code encodercomprising of two parallel RSC encoders. The input bit sequence isrepresented as d=(d₁, d₂, d₃, . . . , d_(k), . . . , d_(N)), where d_(k)is the input bit of the encoder at time k, k is from 1 to N, N is theblock size. The output of the encoder at time k is represented asc_(k)=(X_(k),y_(1k),y_(2k)). Since the encoder is systematic, sox_(k)=d_(k), the surplus code output is represented as y_(1k), y_(2k).The decoding structure of the turbo-code is shown in FIG. 2. The decoder200 comprises two recursive decoding units 210 and 220; two recursivedecoding units 210 and 220 are connected in interleaving anddeinterleaving unit as shown as the 212, 214 and 216 in the diagram.

[0043] It is assuming that the Gaussian noise is the noise used in thecommunication channel. It is further assuming that the noise of eachtransmission symbol is an independent noise, the expectation value is 0,and the variant is N₀/2. Using the binary modulation, if the input bitd_(k) is 0, the modulation is −1.0; if the input bit d_(k) is 1, themodulation is +1.0. Therefore, the sequence of the receiving vector R isrepresented as R=(r₁, r₂, r₃, . . . , r_(k), . . . , r_(N)), the kthsymbol is represented as

r _(k)=(r _(1s,k) , r _(1p,k) , r _(2p,k))=(2x _(k)−1+n _(1s,k), 2y_(1k)−1+n _(1p,k), 2y _(2k)−1+n _(2p,k))

[0044] Wherein, n_(1s,k), n_(1p,k), and n_(2p,k) is the noise of thechannel r_(1s), r_(1p), r_(2p) at time k respectively, and they areindependent each other. The detail of the Maximum A Posteriori(hereafter abbreviated as MAP) algorithm proposed by BCJR (L. Bahl, J.Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of Linear Codes forMinimizing Symbol Error Rate,” IEEE Tran. I. T., Vol.20, pp.284-287,March 1974) is not superfluously described here. Herein, only describethe result of the MAP algorithm. The objective of the MAP algorithm isto calculate whether the A Posterioi Probability (hereafter abbreviatedas APP) of each input bit d_(k) is the ratio of 1 or 0. Wherein, k=0, 1,2, . . . , N−1. From the derivation result of the turbo-code having theerror-correcting capability nears to the Shannon limitederror-correcting proposed by Berrou, Glavieux and Thitimajshimamentioned above, the following equation is obtained: $\begin{matrix}{{\Lambda \left( d_{k} \right)} = {\log \frac{\sum\limits_{m}{\sum\limits_{m^{\prime}}{{\gamma_{k}^{(1)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)} \cdot {\beta_{k}(m)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{{\gamma_{k}^{(0)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)} \cdot {\beta_{k}(m)}}}}}} & (1)\end{matrix}$

[0045] Wherein, α_(k) is the forward recursive probability parameter,β_(k) is the backward recursive probability parameter, γ_(k) ^((i)) isthe branch probability parameter. As we can see from the name, theforward recursive probability parameter α_(k) can be obtained from thecalculation of the previous parameter α_(k−1) and the branch probabilityparameter γ_(k) ^((i)), the equation is as follows: $\begin{matrix}{{\alpha_{k}(m)} = \frac{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{\sum\limits_{i = 0}^{1}{{\gamma_{k}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)}}}}}} & (2)\end{matrix}$

[0046] The backward recursive probability parameter β_(k) can beobtained from the calculation of the next parameter β_(k+1) and thebranch probability parameter γ_(k+1) ^((i)), the equation is as follows:$\begin{matrix}{{\beta_{k}(m)} = \frac{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k + 1}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\beta_{k + 1}\left( m^{\prime} \right)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{\sum\limits_{i = 0}^{1}{{\gamma_{k + 1}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\beta_{k + 1}\left( m^{\prime} \right)}}}}}} & (3)\end{matrix}$

[0047] The branch probability parameter γ_(k) ^((i)) is obtained fromfollowing equation according to the MAP algorithm:

γ_(k) ^((i))(m′,m)=p(γ_(1s,k) |d _(k) =i,s _(k) =m,s _(k−1) =m′)·p(r_(1s,k) |d _(k) =i,s _(k) =m,s _(k−1) =m′)·q(d _(k) =i|s _(k) =m,s_(k−1) =m′)·Pr{s _(k) =m|s _(k−1) =m′}  (4)

[0048] Wherein, whether the probability parameterq(d_(k)=i|s_(k)=m,s_(k−1)=m′) is 0 or 1 depends on the input bit d_(k)=iis 0 or 1 combines the probability of the state m′ to the state m.

[0049] In a sequential calculation decoder, it is assuming that eachΛ(d_(k)) in equation (1) needs a unit of time, wherein, K is from 0 toN+M−1, N stands for the block length of the transmission, and M standsfor the register size of the decoder. It is further assuming that α_(k),β_(k), and γ_(k) ^((i)) in equation (2), (3), and (4) needs a unit oftime respectively, wherein, i=0 or 1. Therefore, the first level decoderneeds 5*(N+M) units of time. According to the decoding algorithm such asthe Viterbi algorithm (A. J. Viterbi, “Error Bound for ConvolutionalCodes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans.Inform. Theorem, vol.IT-13, pp.260-269 April 1967)(A. J. Viterbi and J.K. Omura, “Principles of digital communication and coding,” New York:MacGraw-Hill, 1979) or the BCJR algorithm mentioned above, if N is toosmall, the error-correcting capability is not good. However, if N is toobig, for a communication system needs the real time processing, thedecoding delay is too big to tolerant.

[0050] As mentioned in the previous paragraph, currently the decodingalgorithm is used to decide the value of Λ(d_(k)) in equation (1), ifΛ(d_(k))>0, d_(k)=1, otherwise, d_(k)=0. To calculate each Λ(d_(k)) inequation (1), the α_(k), β_(k), and γ_(k) ^((i)) in equation (2), (3),and (4) must be calculated first. For a sequential calculation decoder,it needs 5*(N+M) units of time (G. Masera, G. Piccinini, M. R. Roch, nadM. Zqmboni, “VLSI Architectures for Turbo Codes,” IEEE Trans. On VLSISystems, vol.7, no.3, pp. 369-379, September 1999).

[0051] In order to increase the calculation speed and thus to increasethe throughput. A preferred embodiment of the present invention adoptsthe parallel and systolic array VLSI structure design. The whole decodercircuit is composed of P levels parallel decoding units. The structureis shown in FIG. 3. There is a serial in parallel out unit before thefirst level to receive the message r_(1s,k), r_(1p,k) and r_(2p,k)wherein, the subscript K=0, 1, . . . , N+M−1 represents the whole blockand end message. The output is sent to the first level decoding unit,the other input of the first level decoding unit is L_(a,k), herein, theL_(a,k) is the parameter obtained via the deinterleaving on the previouslevel extrinsic parameter Λ(d_(k)), the initial value of the 0 th leveldecoding unit extrinsic parameter is set as L_(a0,k)=(0, 0 . . . , 0).The first level extrinsic parameter L_(a1,k) is generated via the firstlevel decoding unit, and the message r_(1s,k), r_(1p,k), and r_(2p,k)sequentially pass through to be the input of next level.

[0052] Each level of the decoding unit comprises two decoders. These twodecoders are the first decoder and the second decoder as shown in FIG.4, wherein, the structure of the first decoder is similar to the seconddecoder's. The whole systolic array VLSI structure is shown in FIG. 5.Wherein, N and M can be adjusted according to the design requirement.For easy to describe, the block length N=4 and register size M=3 areused as an example. FIG. 6 schematically shows the structure of thesimplified modules, data streams, and the latches. It is apparent forthose who skilled in the art that even the embodiment is used as anexample in the present invention, the embodiment will not limit theapply range of the present invention.

[0053] According to the literature (I. L. Turner, “A Modified BAHLAlgorithm for Recursive System Convolutional Codes on Rayleigh FadingChannels,” IEEE 49th Vehicular Technology Conference, pp.75-76 vol. 1,1999), the apriori probability of the input bit d_(k) calculated by theprevious level decoder is represented as $\begin{matrix}{{{\Pr \left\{ {s_{k} = {\left. m \middle| s_{k - 1} \right. = m}} \right\}} = \frac{e^{L{(d_{K})}}}{1 + e^{L{(d_{K})}}}},{{{if}\quad q\quad \left( {{d_{k} = {\left. 1 \middle| s_{k} \right. = m}},{s_{k - 1} = m^{\prime}}} \right)} = 1}} & (5) \\{{{\Pr \left\{ {s_{k} = {\left. m \middle| s_{k - 1} \right. = m}} \right\}} = {\frac{e^{L{(d_{K})}}}{1 + e^{L{(d_{K})}}} = \frac{1}{1 + e^{L{(d_{K})}}}}},{{{if}\quad {q\left( {{d_{k} = {\left. 0 \middle| s_{k} \right. = m}},{s_{k - 1} = m^{\prime}}} \right)}} = 1}} & (6)\end{matrix}$

[0054] Wherein, L(d_(k)) is the log likelihood ratio (LLR) extrinsicparameter calculated from the message bit d_(k) by the previous leveldecoder. It is assumed in a AWGN channel, well than, the partialprobability of the equation (4) is calculated as follows:$\begin{matrix}{{{p\left( {{\left. r_{{1s},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)} = {\frac{1}{\sqrt{2\pi}\sigma_{r1s}}{\exp \left\lbrack \frac{- \left( {r_{{1s},k} - \mu_{r1s}} \right)^{2}}{2\sigma_{r1s}^{2}} \right\rbrack}}}\quad} & (7) \\{{{p\left( {{\left. r_{{1p},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)} = {\frac{1}{\sqrt{2\pi}\sigma_{r1p}}{\exp \left\lbrack \frac{- \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{2\sigma_{r1p}^{2}} \right\rbrack}}}\quad} & (8)\end{matrix}$

[0055] Wherein, μ_(r1s) and μ_(r1p)(m′,m) is the expectation value ofr_(1s) and r_(1p) respectively. Thereinto, μ_(r1s) depends on the inputbit, and μ_(r1p)(m′,m) depends on the input bit and also impacted by theprevious state and current state. σ_(r1s) ² and σ_(r1p) ² is the variantof the r_(1s) and r_(1p) respectively. It is assumed that the variant ofr_(1s) and r_(1p) are the same. Therefore, the above two equations canbe multiplied and consolidated as follows: $\begin{matrix}{{{p\left( {{\left. r_{{1s},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)} \cdot {p\left( {{\left. r_{{1p},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)}} = {\frac{1}{2{\pi\sigma}^{2}}{\exp \left\lbrack {\frac{- 1}{2} \cdot \frac{\left( {r_{{1s},k} - \mu_{r1s}} \right)^{2} + \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{\sigma^{2}}} \right\rbrack}}} & (9)\end{matrix}$

[0056] For a discrete memory-less gauss channel, the branch probabilityparameter γ_(k) ¹ or γ_(k) ⁰ for input bit is 1 or 0 can be calculatedfrom the equation (4), (5), (6), and (9) as follows: $\begin{matrix}{{\gamma_{k}^{(1)}\left( {m^{\prime},m} \right)} = {\frac{1}{2{\pi\sigma}^{2}}{{\exp \left\lbrack {\frac{- 1}{2} \cdot \frac{\left( {r_{{1s},k} - 1} \right)^{2} + \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{\sigma^{2}}} \right\rbrack} \cdot \frac{e^{L{(d_{K})}}}{1 + e^{L{(d_{K})}}}}}} & (10) \\{{\gamma_{k}^{(0)}\left( {m^{\prime},m} \right)} = {\frac{1}{2{\pi\sigma}^{2}}{{\exp \left\lbrack {\frac{- 1}{2} \cdot \frac{\left( {r_{{1s},k} + 1} \right)^{2} + \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{\sigma^{2}}} \right\rbrack} \cdot \frac{1}{1 + e^{L{(d_{K})}}}}}} & (11)\end{matrix}$

[0057] According to the equation (10) and (11), the branch probabilityparameter γ_(k) ^((i))(m′,m) can be calculated in parallel. The N+Munits of the module C (as shown in FIG. 7) are used to calculate eachγ_(k) ^((i))(m′, m) in parallel. Thus, the N+M units of time can beshortened to a unit of time. The input signal of the module C in FIG. 7is L_(a,k), r_(1s,k) and r_(1p,k) respectively, wherein, k=1, . . . ,N+M. The module C is used to calculate γ_(k) ⁽¹⁾(m′,m) and γ_(k)⁽⁰⁾(m′,m) respectively.

[0058] In addition, since the forward recursive probability parameterα_(k) is output from the previous level to be the input of the nextlevel, and the backward recursive probability parameter β_(k) is outputfrom the next level to be the input of the previous level. It issuitable to design as the systolic array VLSI to increase thecalculation speed. According to the equation (2), N+M units of Module A(as shown in FIG. 8) are used to calculate α_(k). Wherein, the firstlevel input is γ₁ ⁽¹⁾(m′,m) and γ₁ ⁽⁰⁾(m′,m) and the initial value ofthe forward recursive probability parameter α₀(m) are used to calculateα₁(m). The second level input γ₂ ⁽¹⁾(m′,m) and γ₂ ⁽⁰⁾(m′,m) and α₁(m)are used to calculate α₂(m). Thus, the systolic array is able to worksimultaneously. All α_(k)(m), wherein k=1, . . . , N-M, can becalculated after N+M units of time.

[0059] According to the equation (3), it adopts N+M units of Module B(as shown in FIG. 9) for calculating β_(k). Wherein, the first levelinput is γ_(N+M) ⁽¹⁾(m′,m) and γ_(N+M) ⁽⁰⁾(m′,m) and the initial valueof the backward recursive probability parameter β_(N+M)(m) are used tocalculates β_(N+M−1)(m). The inputs of the second level γ_(N+M)⁽¹⁾(m′,m) and γ_(N+M−1) ⁽⁰⁾(m′,m), and the backward recursiveprobability parameters β_(N+M−1)(m) are used to calculate β_(N+M−2)(m).The advantage is the structure of each module is the same; the output ofthe previous level is the input of the next level. Thus, the throughputis (N+M) times of the original throughput.

[0060] When the calculation of α_(k), β_(k) and γ_(k) ^((i)) arecompleted, according to the equation (1), it adopts N+M units of moduleD (as the module D shown in FIG. 10) to calculate Λ(d_(k)). By using theparallel calculation, the N+M units of time is shortened to a unit oftime.

[0061] The submodule L located in between the module A and the module Bcalculates the product-sum of two inputs. As the example shown in FIG.11, the submodule L adopts the analog circuit provided by theconventional technique. The analog circuits proposed by the referenceliteratures also can be used. Like H. -A. Loeliger, F. Lustenberger, F.Tarkoy, M. Helfensten, “Decoding in Analog VLSI,” IEEE CommunicationMagzine, Vol.37 (4), pp.99-101 April 1999, or H. -A. Loeliger, F.Lustenberger, M. Helfensten, F. Tarkoy, “Probability Propagation andDecoding in Analog VLSI,” IEEE Trans.on Information Theory, Vol.47(2),pp.837-843 February 2001, or F. Lustenberger, M. Helfenstein, H, -A,Loeliger, F. Tarkoy, G. S. Moschytz, “An Analog VLSI Decoding Techniquefor Digital Codes,” ISCAS '99. Proceedings of the 1999 IEEEinternational Symposium on Circuits and Systems, Vol. 2, pp.424-4271999, . . . , etc.

[0062] For easy to describe the detail structure of the module A, B, andD mentioned above, the preferred embodiment of the present inventionuses the turbo-code of the third generation CDMA mobile communicationstandard as an example for description. However, it is not used to limitthe apply range of the present invention. The turbo-code of the thirdgeneration CDMA mobile communication standard is: a decoder registersize M=3. For the first decoder and the second decoder, the code ratioR=1/3, the parameter of the feedback generator and the parameter of thedirect-feed-forward generator is G_(b)=1011 and G_(d)=1110 respectively.As shown in FIG. 12, the recursive systematic convolution encoder(hereafter abbreviated as RSC), wherein, the RSC adopts the fast RSCencoder, for the physical content of the fast RSC encoder, please referto the “Fast Turbo-code Encoder” proposed by the same inventor of thepresent invention in April, 2001. The trellis diagram is shown in FIG.13.

[0063] Referring to the content of FIG. 6, FIG. 6 schematically showsthe structure of the simplified modules, data streams, and the latcheswhen the block length N=4 and the register size M=3. There are N+M=7units of the module A, B, C, and D. In the first unit of time, theparallel input L_(a,k), r_(1s,k) and r_(1p,k) signals, k=1,2, . . . ,6,7 are used simultaneously to calculate the γ₁ ^((i)), γ₂ ^((i)), . . ., γ₇ ^((i)). In the 7 units of time afterwards, the α₁, α₂, . . . , α₆and β₁, β₂, . . . , β₆ is calculated respectively. In the other one unitof time afterwards, according to the equation (2), the parallel inputγ_(k) ⁽¹⁾(m′,m), γ_(k) ⁽⁰⁾(m′,m), α_(k−1) and β_(k−1) are used tocalculate Λ(d_(k)). The Λ(d_(k)) is used as the extrinsic parameter ofthe next level, if the last level is reached, the d_(k) is determinedaccordingly, if d_(k)>0, determine d_(k)=1, otherwise d_(k)=0.

[0064] According to the trellis diagram of FIG. 13. It is easy tosimplify the structure of the module A, B, and D. FIG. 14 schematicallyshows the detail structure of the module A based on this design. Thedetail structure of the module B is also similar to the module A. Thedetail structure of the module D is shown in FIG. 15.

[0065] The latency spent for accomplishing a message with one block sizelength of the parallel and systolic array VLSI structure design of thepreferred embodiment according to the present invention, as shown inFIG. 16, is N+M+2 units of time. Comparing to the original conventionalsequential calculation structure that needs 5*(N+M) units of time, thetime is shortened to about ⅕ only. Furthermore, the systolic array VLSIstructure design is able to generate a set of d_(k) in every one unit oftime after the first set of d_(k) is generated.

[0066] The performance comparison is shown in table 1: TABLE 1 Thestructure comparison of the systolic array and the sequential typeSystolic Array Item/Structure Sequential Structure Structure Pro and ConLatency 5*(N + M) (N + M) + 2 The latency is about ⅕ Output Time 5*(N +M) 1 The throughput is about 5*(N + M) times Number of Hardware 1 5*(N +M) The complexity of Gate the circuit is about 5 *(N + M) times

[0067] In order to prove the error-correcting feature of the preferredembodiment according to the present invention. Herein, the CDMA mobilecommunication system mentioned above is used as an example. The RSCdecoder with register size M=3 is shown in FIG. 12. The trellis diagramis shown in FIG. 13. The iterative decoding number P=6. The randominterleaving method is adopted in between the first decoder and thesecond decoder. The simulation result is obtained as shown in FIG. 17,wherein, the block length N=65536, the vertical axis is the decodingperformance denoted by the bit error rate (BER). The horizontal axis isthe communication environment denoted by the signal/noise ratio. As wecan see here, under the situation with the same signal/noise ratio, thelarger the iterative decoding number, the better the decodingperformance. This is accorded with the theory, and is similar to thesimulation result disclosed in the contents of the literatures: C.Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon LimitedError-correcting Coding and Decoding: Turbo-codes (1),” in Proc. ICC'93,May, 1993, and P. Robertson “Illuminating the Structure of Code andDecoder of Parallel Concatenated Recursive Systmatic (Turbo) Codes,” inProc. IEEE GLOBECOM Conf., San Francisco, Calif. Pp. 1298-1303, December1994.

[0068] The present simulation uses the programming language C languagerunning on the GenuineInter Pentium® III CPU,128 MB RAM personalcomputer. The simulation runs on the working platform with the WindowsMe® operating system. The bit error rate comparison shown in FIG. 17,wherein, the iterative decoding number (p=1, . . . , 6), the code ratioR=1/3, the register size M=3, the generator parameter G_(b)=1011,G_(d)=1110, and uses the 256*256 random interleaving deinterleavingmethod.

[0069] The present invention provides a fast turbo-code decoder.Wherein, the decoder is designed to use the systolic array VLSIcircuits. Since the output of previous level can be used as the input ofnext level. So the advantages of the parallel and the pipelinecalculation are totally achieved. The latency is only N+M+2 units oftime, the latency is shorten to as about ⅕ comparing to the conventionalsequential calculation structure that takes 5*(N+M) units of time. Thedecoding throughput is about 5*(N+M) times higher than the conventionaldecoder. Although the quantity of the circuit gate is about 5*(N+M)times higher than the conventional decoder. However, the VLSI techniqueshad been progressively improved nowadays, thus the hardware complexityis easy to overcome. Devoting the hardware cost to get the higher speedwill be a changeless trend.

[0070] Although the invention has been described with reference to aparticular embodiment thereof, it will be apparent to one of theordinary skill in the art that modifications to the described embodimentmay be made without departing from the spirit of the invention.Accordingly, the scope of the invention will be defined by the attachedclaims not by the above detailed description.

What is claimed is:
 1. A turbo-code decoder for communication system,the decoder comprising: a serial-to-parallel output unit, used toreceive a serial input signal and output a parallel signal afterconverting the serial input signal; and a plurality of parallel decodingunits, wherein the parallel decoding units are serially connected toform a plurality of levels, the first level parallel decoding unitreceives the parallel signal that is output from the serial-to-paralleloutput unit, the output from the first level parallel decoding unit issent to the second level parallel decoding unit, with certain sequence,the parallel signal passes through the parallel decoding units fordecoding process.
 2. The turbo-code decoder of claim 1, wherein each ofthe parallel decoding unit receives an extrinsic parameter whenprocessing the decoding process, to be the signal that is after thedecoding process from the parallel decoding unit, and sends theextrinsic parameter to the next level of the parallel decoding unit. 3.The turbo-code decoder of claim 2, wherein the extrinsic parameter isobtained from a deinterleaving operation, the extrinsic parameter of thefirst level parallel decoding unit is L_(a0,k)=(0, 0 . . . , 0), wherek=1, 2, . . . , N, N is the block length of the turbo-code.
 4. Theturbo-code decoder of claim 1, wherein the serial input signal arer_(1s,k), r_(1p,k), and r_(2p,k) messages of the turbo-code, whereask=1, 2, . . . , N, N is the block length of the turbo-code.
 5. Theturbo-code decoder of claim 4, wherein the serial-to-parallel outputunit receives the r_(1s,k), r_(1p,k), and r_(2p,k), wherein thesubscript K=0, 1, . . . , N+M−1 represents the whole block and an endmessage, wherein M stands for a total number of latch units of theturbo-code decoder, the serial-to-parallel output unit coverts thereceived r_(1s,k), r_(1p,k), and r_(2p,k) messages and outputs resultsto the first level parallel decoding unit in parallel, the first levelparallel decoding unit also receives an extrinsic parameter L_(a,k) atthe same time, the parameter L_(a,k) is obtained via a deinterleavingoperation on the previous level extrinsic parameter Λ(d_(k)), theinitial value of the first level decoding unit extrinsic parameter isset as L_(a0,k)=(0, 0 . . . , 0), a first level extrinsic parameterL_(a1,k) is generated via the first level parallel decoding unit, andthe message r_(1s,k), r_(1p,k) and r_(2p,k) pass through sequentially tobe the input of the next level.
 6. The turbo-code decoder of claim 5,wherein the parallel decoding unit comprises: a first decoder, used toreceive the r_(1s,k), r_(1p,k) messages and the extrinsic parameterL_(a,k); a second decoder, used to receive the r_(2p,k) message and theextrinsic parameter L_(a,k); an interleaving unit, located between thefirst decoder and the second decoder, used to receive the output of thefirst decoder; and a deinterleaving unit, used to connected to thesecond decoder, alternately outputs the output of the first decoder andthe second decoder.
 7. The turbo-code decoder of claim 6, wherein thefirst decoder of the parallel decoding units constitutes a systolicarray very large scaled integrated (VLSI) circuits structure.
 8. Theturbo-code decoder of claim 7, wherein the systolic array VLSI circuitsis composed of N+M units of the module C, A, B, D, and E, wherein, themodule C receives L_(a1,k), r_(1s,k) and r_(1p,k), and outputs γ_(k)⁽¹⁾(m′,m) and γ_(k) ⁽⁰⁾(m′,m), the module A calculates a forwardrecursive probability parameter α_(k), the module B calculates abackward recursive probability parameter β_(k), the module D adopts(N+M) units of parallel calculation to obtain the Λ(d_(k)) after thecalculation of the α_(k), β_(k), and γ_(k) ^((i)) are finished, and themodule E outputs the value of the calculation from the module D, whereK=1, 2, . . . , N+M.
 9. The turbo-code decoder of claim 8, wherein thevalue of the Λ(d_(k)) is calculated according to a MAP algorithm andfollowing equation:${{\Lambda \left( d_{k} \right)} = {\log \frac{\sum\limits_{m}{\sum\limits_{m^{\prime}}{{\gamma_{k}^{(1)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)} \cdot {\beta_{k}(m)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{{\gamma_{k}^{(0)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)} \cdot {\beta_{k}(m)}}}}}},$

wherein α_(k) is the forward recursive probability parameter, β_(k) isthe backward recursive probability parameter, γ_(k) ^((i)) is a branchprobability parameter.
 10. The turbo-code decoder of claim 9, whereinthe forward recursive probability parameter α_(k) is obtained from thecalculation of the previous parameter

and the branch probability parameter γ_(k) ^((i)), the equation is asfollows:${\alpha_{k}(m)} = \frac{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\alpha_{k - 1}\left( m^{\prime} \right)}}}}}$


11. The turbo-code decoder of claim 9, wherein the backward recursiveprobability parameter β_(k) is obtained from the calculation of the nextparameter β_(k+1) and the branch probability parameter γ_(k+1) ^((i))the equation is as follows:${\beta_{k}(m)} = \frac{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k + 1}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\beta_{k + 1}\left( m^{\prime} \right)}}}}{\sum\limits_{m}{\sum\limits_{m^{\prime}}{\underset{i = 0}{\sum\limits^{1}}{{\gamma_{k + 1}^{(i)}\left( {m^{\prime},m} \right)} \cdot {\beta_{k + 1}\left( m^{\prime} \right)}}}}}$


12. The turbo-code decoder of claim 9, wherein the branch probabilityparameter γ_(k) ^((i)) is obtained from following equation according tothe MAP algorithm: γ_(k) ^((i))(m′,m)=p(γ_(1s,k) |d _(k) =i,s _(k) =m,s_(k−1) =m′)·p(r _(1s,k) |d _(k) =i,s _(k) =m,s _(k−1) =m′)·q(d _(k) =i|s_(k) =m,s _(k−1) =m′)·Pr{s _(k) =m|s _(k−1) =m′} wherein whether theprobability parameter q(d_(k)=i|s_(k)=m,s_(k−1)=m′) is 0 or 1 depends onthe input bit d_(k)=i is 0 or 1 combines the probability of the state m′to the state m.
 13. The turbo-code decoder of claim 11, wherein,assuming in a AWGN channel, the probability is calculated as follows:${p\left( {{\left. r_{{1s},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)} = {\frac{1}{\sqrt{2\quad \pi}\sigma_{r1s}}{\exp \left\lbrack \frac{- \left( {r_{{1s},k} - \mu_{r1s}} \right)^{2}}{2\sigma_{r1s}^{2}} \right\rbrack}}$${{p\left( {{\left. r_{{1p},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)} = {\frac{1}{\sqrt{2\quad \pi}\sigma_{r1p}}{\exp \left\lbrack \frac{- \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{2\sigma_{r1p}^{2}} \right\rbrack}}},$

wherein μ_(r1s) and μ_(r1p)(m′,m) is the expectation value of r_(1s) andr_(1p) respectively, thereinto, μ_(r1s) depends on the input bit, andμ_(r1p)(m′,m) depends on the input bit and also impacted by the previousstate and current state,

and σ_(r1p) ² is the variant of the r_(1s) and r_(1p) respectively. 14.The turbo-code decoder of claim 12, wherein, assuming that the variantof r_(1s) and r_(1p) are the same, therefore, the above two equationscan be multiplied and consolidated as follows:${{p\left( {{\left. r_{{1s},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)} \cdot {p\left( {{\left. r_{{1p},k} \middle| d_{k} \right. = i},{s_{k} = m},{s_{k - 1} = m^{\prime}}} \right)}} = {\frac{1}{2\quad {\pi\sigma}^{2}}{\exp \left\lbrack {\frac{- 1}{2}\frac{\left( {r_{{1s},k} - \mu_{r1s}} \right)^{2} + \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{\sigma^{2}}} \right\rbrack}}$


15. The turbo-code decoder of claim 11, wherein assuming for a discretememory-less gauss channel, the branch probability parameter γ_(k) ⁽¹⁾ orγ_(k) ⁽⁰⁾ for input bit being 1 or 0 can be calculated from the equationas follows:${\gamma_{k}^{(1)}\left( {m^{\prime},m} \right)} = {\frac{1}{2\quad {\pi\sigma}^{2}}{{\exp \left\lbrack {\frac{- 1}{2}\frac{\left( {r_{{1s},k} - 1} \right)^{2} + \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{\sigma^{2}}} \right\rbrack} \cdot \frac{e^{L{(d_{K})}}}{1 + e^{L{(d_{K})}}}}}$${{\gamma_{k}^{(0)}\left( {m^{\prime},m} \right)} = {\frac{1}{2\quad {\pi\sigma}^{2}}{{\exp \left\lbrack {\frac{- 1}{2}\frac{\left( {r_{{1s},k} + 1} \right)^{2} + \left( {r_{{1p},k} - {\mu_{r1p}\left( {m^{\prime},m} \right)}} \right)^{2}}{\sigma^{2}}} \right\rbrack} \cdot \frac{1}{1 + e^{L{(d_{K})}}}}}}$


16. The turbo-code decoder of claim 5, wherein the N=4 and the registersize M=3, the simplified modules, a data stream, and a latch structureare shown as the content of FIG.
 6. 17. The turbo-code decoder of claim5, wherein the a priori probability of the input bit d_(k) calculated bythe previous level parallel decoding unit can be used by the next leveldecoder.
 18. The turbo-code decoder of claim 5, wherein L(d_(k)) is thelog likelihood ratio (LLR) extrinsic parameter calculated from themessage bit d_(k) by the previous level decoder.